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Abstract 

Multiple independent radio frequency (RF) beams find applications in communications, ra¬ 
dio astronomy, radar, and microwave imaging. An A-point FFT applied spatially across an 
array of receiver antennas provides A-independent RF beams at ^ log 2 A multiplier complex¬ 
ity. Here, a low-complexity multiplierless approximation for the 8-point FFT is presented for 
RF beamforming, using only 26 additions. The algorithm provides eight beams that closely 
resemble the antenna array patterns of the traditional FFT-based beamformer albeit without 
using multipliers. The proposed FFT-like algorithm is useful for low-power RF multi-beam 
receivers; being synthesized in 45 nm CMOS technology at 1.1 V supply, and verified on-chip 
using a Xilinx Virtex-6 Lx240T FPGA device. The CMOS simulation and FPGA implementa¬ 
tion indicate bandwidths of 588 MHz and 369 MHz, respectively, for each of the independent 
receive-mode RF beams. 


1 Introduction 

Antenna array based radio frequency (RF) applications such as radar, wireless communications, 
localization, remote sensing, signal intelligence, radio astronomy, search for extraterrestrial intelli¬ 
gence (SETI), and imaging requires the fundamental operation of receive mode beamforming. To 
wit, beamforming is precisely the directional enhancement of propagating electromagnetic planar- 
waves based on their directions of arrival (DOA), whilst suppressing undesired noise and interference 
that impinge on the antenna array. The ability to form multiple receiver beams is known as “multi¬ 
beamforming” mm- Multiple RF beams, each having a unique “look direction”—the direction of 
maximum sensitivity—is needed for multiple visibilities. 
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Figure 1: ULA-based multi-beamformer using a spatial FFT. 

Multiple simultaneous beams are also needed for search-and-track radar, which in volume- 
scan mode, continuously monitor airborne threats, such as aircraft, warheads and cruise missiles, 
across a given range of angles. From the standpoint of high-capacity wireless communications, 
simultaneous receiver beams are of importance to multi-input multi-output (MIMO) systems. The 
application of an A^-point fast Fourier transform (FFT)—at each time sample—spatially, across a 
uniform linear array (ULA) of antennas, is a technique for achieving a plurality of independent RF 
beams [H [2]. The FFT efficiently computes the discrete Fourier transform (DFT) with ^ log 2 
multiplications. Fig. [1] shows an overview of a ULA-based multi-beamformer using a spatial FFT. 
For an A^-element ULA, the spatial FFT beamformer provides N beams, each uniformly spaced in 
the frequency domain by the interval 2tt/N. The signal is first sent through a low noise amplifier 
(LNA) and the real (I, in-phase, Vreai) and the imaginary (Q, quadrature, Vim) parts are low-pass 
filtered and sampled using analog-to-digital converters (ADCs), before application of the DFT. The 
spatial angle ^|J is the independent variable used in the polar array beam-patterns. 

RF aperture power consumption is directly proportional to circuit complexity and clock fre¬ 
quency. Because multiplier hardware dominates circuit complexity, the utilization of FFT hard¬ 
ware having as small a number of parallel multiplier circuits as possible is preferable in terms 
of reduction of overall circuit complexity and power consumption of the multi-beamformer. The 
proposed fast algorithm approximates the FFT computation without using any multipliers at all, 
making the corresponding digital architecture very simple to realize on-chip. Because the proposed 
fast algorithm only requires 26 addition operation, the corresponding architecture is of lower power 
consumption compared to usual FFT-based circuits having parallel multipliers to implement the 
twiddle factors. 


2 Multiplier-Free DFT Approximation 


The DFT is a linear orthogonal transformation relating 
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vector V = 
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Vk = J2n=o k = 0,1,..., N — I, where con = exp {—27rj'/A'} is the A^th root of unity [3] 
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and j = y/—l. In matrix formalism, the above expression reduces to: V = F^v • v, where Fjv is the 
DFT matrix, whose (i, A:)-th element is given by fi^k = for i,k = 0,1,..., — 1. The direct 

DFT computation requires N'^ complex multiplications and N ■ {N — 1) additions. Thus, fast algo¬ 
rithms are necessary and are often able to reduce the computation cost of the DFT computation 
to 0{N ■ log 2 N) multiplications [ij. 

We submitted the 8 -point DFT matrix Fg to the parametric-based optimization method de¬ 
scribed in [5] to derive a matrix approximation. Two major constraints were imposed on the sought 
approximations: (i) near-orthogonality and (ii) low-complexity. Thns, we obtained that the optimal 
elements for the parametric approximation of Fg are 1, (1 — j)/2, and —j. Such parameters result 
in the following matrix approximation: 

2 22 22 22 2 -| 

2 1-j -2j -1-j -2 -1+j 2j l+i 

2 -2j -2 2j 2 -2j -2 2j 

2 -1-j 2j 1-j -2 1+j -2j -1+j 

2 -22-22-22-2' 

2 -1+J -2j 1+J -2 1-J 2j -1-J 

2 2j -2 -2j 2 2j -2 -2j 

2 1+J 2j -1+J -2 -1-J -2j 1-jJ 

Compared to the exact DFT matrix, above approximation has a mean squared error of 0.686, 
which is considered low. Although not exactly orthogonal, the proposed approximation is very 
close to orthogonality. Considering the deviation from orthogonality measure [ 6 ], the proposed 
transform displayed a deviation of 0.03; whereas, in comparison, the popular non-orthogonal DCT 
approximation SDCT [7] has a deviation from orthogonality of 0.20. 

The proposed approximate matrix Fg preserves the symmetry of the DFT and has null multi¬ 
plicative complexity. Still requiring 64 additions and 32 bit-shifting operations, a farther redaction 
in the additive complexity can be obtained by means of a tailored fast algorithm. Let be the 
identity matrix of order n and = [} _}] ®I „/25 where ( 8 i denotes the Kronecker product. Thus, 
employing the matrix factorization methods suggested in [3], we have the following fast algorithm: 



Fg =P X diag (I2, Ai, A3) x D2 x diag (62,12, A4) 
X Di X diag (B4, A2) x Bg, 


where Ai = [\ J], A 2 = 
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permutation matrix, and is the 8 -point colnmn vector with element 1 at the ith position and 
0 elsewhere. Figure [2] depicts the signal flow graph of the introduced algorithm. The arithmetic 
complexity assessment in terms of real operations and comparisons are summarized in Table [TJ 
Each row i of matrix Fg may be interpreted as the coefficients of a discrete filter whose transfer 
function is = Ylk=ofi,k • exp(—jfcw), i = 0,1,..., 7, for w G [—tt, vr] [3]. In the case 

of multi-beam forming, the exact or approximate DFT are applied spatially, across a ULA of 
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Figure 2: Signal flow graph for the factorization of Fg. Input data Vi, i = 0,1,... ,7, relates to the 
output Vfc, A; = 0,1,... , 7. Dotted arrows represent multiplications by —1. 


Table 1: Real operation assessment and comparison 


Method 

Multiplications 

Additions 

Shifts 

Exact DFT 

256 

240 

0 

FFT (complex input) [3] 

4 

52 

0 

FFT (real input) [3] 

2 

26 

0 

Proposed (complex input) 

0 

52 

4 

Proposed (real input) 

0 

26 

2 


4 






























antennas. Here, variable w is the spatial frequency across the ULA. Let the normalized temporal 
frequency of the incident plane wave be ojt < vr. From physics, we have that u) = —ujt sin fj, 
for —7r/2 < Ip < '7r/2, measured counter-clockwise from ULA broadside. We set eot = vr, which 
corresponds to ip G [—7r/2,7r/2]. Thus, the array patterns are given by: 

Pi 

where Pi = max^ \Hi{—oJt sin('!/)))|, for i = 0,1,..., 7, is a normalization factor. Mutatis mutandis, 
the array patterns based on the proposed approximation are denoted by Pi{’ip,Fs), v = 0,1,..., 7. 
Figure [3l^a)-(b) shows the pattern arrays associated to each row of Fg and Fg. The eight inde¬ 
pendent beams are pointed at angles ipk = 0.00, ±14.47, ±30.00, ±48.59, 90.00 in degrees measured 
from array broadside direction, as expected from the conventional DFT beamformer. To quantify 
the difference between corresponding array patterns, we considered the following error function: 


AW= Pi{pj-,F8) - Pi{pj-,F8] 


= 0 , 1 , 


In Figure [3jc), the polar plot of Dpip) for all rows of Fg is displayed. The error energy can be 
obtained integrating Dp'ip)-. 



This computation fnrnished e* = 1.08, for odd i, and e, = 0, for even i. The total error energy is 
4.32. For comparison, the approximate DOT described in [S] has a total error energy of 4.12. 


3 FPGA Realization and ASIC Synthesis 

The proposed multiplierless architecture was realized on digital hardware using an ML-605 Xilinx 
Virtex -6 field programmable gate array (FPGA) prototyping board. The design was built and 
tested for 16-bit inpnts via JTAG interface. Moreover, it was pipelined to minimize the critical- 
path delay (Tcpd)) which in turn offers the maximum frequency of operation and RF bandwidth. 
The on-FPGA measured results verified the performance of the proposed architecture. The FPGA 
resource consumption, including the number of slices, look-up tables (LUTs), and flip-flop (FF) 
connt, is presented in Table [2j The percentage ntilization of the available resources is also shown. 
The pipelined design offered a maximnm frequency of 739 MHz corresponding to a maximnm RF 
bandwidth of 369 MHz for each of the eight beams. 

The FPGA-based digital design was imported to Cadence RTL compiler for application-specific 
integrated circuit (ASIC) synthesis nsing 45 nm complementary metal oxide semicondnctor (CMOS) 
technology, for an operating voltage of 1.1 V at 27°C. Table [3] displays the area, power, critical path 
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(a) DFT-based (b) Proposed (c) Error 

Figure 3; Polar plots of Fg), i = 0 , 1 ,... , 7 , V' £ 7 r/2 ] at the frequency ujt = tt for the 

(a) exact transform Fg, (b) proposed approximate transform Fg, and (c) error measure 


Table 2; FPGA resource consumption 


Resources 

Proposed 

Slice Registers 

3064 (1%) 

Slice LUTs 

2044 (1%) 

Occupied Slices 

620 (1%) 

LUT-FF Pairs 

2335 (1%) 

Bonded lOBs 

2 (1%) 

^cpd 

1.353 

Max. Frequency (MHz) 

739.09 
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Table 3; ASIC synthesis results 


Resources 

Proposed 

Area (mm^) 

0.064 

Dynamic Power (mW) 

94.18 

Static Power (mW) 

0.41 

Total Power (mW) 

94.59 

Tcpd (ns) 

0.85 

Max. Frequency (GHz) 

1.176 

AT (mm^ns) 

0.054 

AT^(mm^ns^) 

0.046 


delay, and maximum frequency of operation, at synthesis stage. The area-time (AT) and area-time^ 
(AT^) complexities are reported. The CMOS synthesis shows an increase in the maximum clock 
frequency when compared to its FPGA implementation. 

4 Conclusion 

An 8-point multiplierless DFT approximation requiring 26 additions was proposed. Applications 
in receive mode RF multi-beamforming using a ULA of antennas include communication, radar, 
and radio astronomy. CMOS synthesis and FPGA implementations have indicated bandwidths of 
588 MHz and 369 MHz, respectively. The approximation is suitable for eight digital RF-beams, at 
low power. The DFT approximation allows FFT-like performance without multiplier hardware. 
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